{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "MobileNet_implementation.ipynb",
      "provenance": [],
      "collapsed_sections": [],
      "authorship_tag": "ABX9TyPuybkb0iw5W+mpU5SKD8iO",
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/Machine-Learning-Tokyo/CNN-Architectures/blob/master/Implementations/MobileNet/MobileNet_implementation.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "iYVSkXFfO0du",
        "colab_type": "text"
      },
      "source": [
        "# Implementation of MobileNet\n",
        "\n",
        "<br><span style=\"font-size:14pt;\">We will use the tensorflow.keras Functional API to build MobileNet</span>\n",
        "(https://arxiv.org/pdf/1704.04861.pdf)\n",
        "\n",
        "---\n",
        "\n",
        "In the paper we can read:\n",
        "\n",
        ">**[i]** “All layers are followed by a batchnorm and ReLU nonlinearity [...]”.\n",
        "\n",
        "<br>\n",
        "\n",
        "\n",
        "We will also make use of the following Table **[ii]**:\n",
        "\n",
        "<img src=https://raw.githubusercontent.com/Machine-Learning-Tokyo/DL-workshop-series/master/Part%20I%20-%20Convolution%20Operations/images/MobileNet/MobileNet.png width=\"500\">\n",
        "\n",
        "<br>\n",
        "\n",
        "as well the following Diagram **[iii]** of the *Mobilenet block*:\n",
        "\n",
        "<img src=https://raw.githubusercontent.com/Machine-Learning-Tokyo/DL-workshop-series/master/Part%20I%20-%20Convolution%20Operations/images/MobileNet/MobileNet_block.png width=\"150\">\n",
        "\n",
        "## Network architecture\n",
        "\n",
        "The network starts with a (Conv, BatchNorm, ReLU) block and continues with a series of **Mobilenet blocks** before the final *Avg Pool* and *Fully Connected* layers.\n",
        "\n",
        "<br/>\n",
        "\n",
        "\n",
        "### Mobilenet block\n",
        "\n",
        "The *Mobilenet block* is depicted at the bottom right figure of the above image. Specifically it consists of 6 layers:\n",
        "\n",
        "1. a 3x3 *Depth Wise Convolution* layer\n",
        "2. a *Batch Normalization* layer (**[i]**)\n",
        "3. a *Rectified Linear Unit (ReLU)* activation layer (**[i]**)\n",
        "4. a 1x1 *Convolutional* layer\n",
        "5. a *Batch Normalization* layer\n",
        "6. a *Rectified Linear Unit (ReLU)* activation layer\n",
        "\n",
        "---\n",
        "\n",
        "## Workflow\n",
        "We will:\n",
        "1. import the neccesary layers\n",
        "2. write a helper function for the MobileNet block **[iii]**\n",
        "3. build the stem of the model\n",
        "4. use these helper functions to build main part of the model.\n",
        "\n",
        "---\n",
        "\n",
        "### 1. Imports"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "fMiSudoSRUoo",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "from tensorflow.keras.layers import Input, DepthwiseConv2D, \\\n",
        "     Conv2D, BatchNormalization, ReLU, AvgPool2D, Flatten, Dense"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "54DZcgWaRUNx",
        "colab_type": "text"
      },
      "source": [
        "### 2. MobileNet block\n",
        "Next, we will build the *Mobilenet block* as a function that will\n",
        "- take as input:\n",
        "  - a tensor (**`x`**)\n",
        "  - the number of filters for the Convolutional layer (**`filters`**)\n",
        "  - the strides for the Depthwise Convolutional layer (**`strides`**)\n",
        "- run:\n",
        "    - apply a 3x3 *Depthwise Convolution layer* with **`strides`** strides followed by a *Batch Normalization* and a *ReLU* activation\n",
        "    - apply a 1x1 *Convolution layer* with **`filters`** filters followed by a *Batch Normalization* and a *ReLU* activation\n",
        "- return the tensor\n",
        "\n",
        "and will return the tensor **`output`**."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "E8zeWzjkRNS_",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "def mobilenet_block(x, filters, strides):\n",
        "    x = DepthwiseConv2D(kernel_size=3, strides=strides, padding='same')(x)\n",
        "    x = BatchNormalization()(x)\n",
        "    x = ReLU()(x)\n",
        " \n",
        "    x = Conv2D(filters=filters, kernel_size=1, strides=1, padding='same')(x)\n",
        "    x = BatchNormalization()(x)\n",
        "    x = ReLU()(x)\n",
        "    return x"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "AK9K2hHQRMF0",
        "colab_type": "text"
      },
      "source": [
        "At **[ii]** one can see the following pattern after the first Convolution layer:\n",
        "- Conv dw/s1\n",
        "- Conv/s1\n",
        "\n",
        "These two lines consist one Mobilenet block.\n",
        "- the number after the *s* of the first line is the strides of the Depthwise layer\n",
        "- the last number of the *Filter Shape* column of the second line is the number of filters of the Convolution layer\n",
        "\n",
        "<br/>\n",
        "\n",
        "For example, this is the first Mobilenet block of **[ii]**:\n",
        "\n",
        "| Type / Stride    \t| Filter Shape \t|\n",
        "|---------------   \t|:------------:\t|\n",
        "| Conv dw/s**1**   \t| 3x3x32 dw    \t|\n",
        "| Conv/s1           | 1x1x32x**64** |\n",
        "\n",
        "The corresponding call of the mobilenet_block() function would be:\n",
        "\n",
        ">`mobilenet_block(x, fitlers=64, strides=1)`\n",
        "\n",
        "---\n",
        "\n",
        "### 3. Stem of the model\n",
        "\n",
        "From **[ii]**:\n",
        "\n",
        "| Type / Stride \t| Filter Shape \t|\n",
        "|---------------\t|:------------:\t|\n",
        "| Conv/s2      \t  | 3x3x3x32    \t|"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "RXW-IxstQ9Ky",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "input = Input(shape=(224, 224, 3))\n",
        "x = Conv2D(filters=32, kernel_size=3, strides=2, padding='same')(input)\n",
        "x = BatchNormalization()(x)\n",
        "x = ReLU()(x)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "mo7-yRUrQ8oY",
        "colab_type": "text"
      },
      "source": [
        "### 4. Main part of the model\n",
        "\n",
        "From **[ii]**:\n",
        "\n",
        "\n",
        "| Type / Stride \t| Filter Shape \t|\n",
        "|---------------\t|:------------:\t|\n",
        "| Conv dw/s1    \t| 3x3x32 dw    \t|\n",
        "| Conv/s1       \t| 1x1x32x64    \t|"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "p2_q7AFGQ1RU",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "x = mobilenet_block(x, filters=64, strides=1)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "hV7OFxIZQ0t3",
        "colab_type": "text"
      },
      "source": [
        "From **[ii]**:\n",
        "\n",
        "| Type / Stride \t| Filter Shape \t|\n",
        "|---------------\t|:------------:\t|\n",
        "| Conv dw/s2    \t| 3x3x64 dw    \t|\n",
        "| Conv/s1       \t| 1x1x64x128   \t|\n",
        "| Conv dw/s1    \t| 3x3x128 dw   \t|\n",
        "| Conv/s1       \t| 1x1x128x128  \t|"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "0rDG1akpQr3A",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "x = mobilenet_block(x, filters=128, strides=2)\n",
        "x = mobilenet_block(x, filters=128, strides=1)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "XNEp0PyaQky8",
        "colab_type": "text"
      },
      "source": [
        "From **[ii]**:\n",
        "\n",
        "| Type / Stride \t| Filter Shape \t|\n",
        "|---------------\t|:------------:\t|\n",
        "| Conv dw/s2    \t| 3x3x128 dw   \t|\n",
        "| Conv/s1       \t| 1x1x128x256  \t|\n",
        "| Conv dw/s1    \t| 3x3x256 dw   \t|\n",
        "| Conv/s1       \t| 1x1x256x256  \t|"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "-_UzaWiIQdBQ",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "x = mobilenet_block(x, filters=256, strides=2)\n",
        "x = mobilenet_block(x, filters=256, strides=1)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "raoJI00tQcvv",
        "colab_type": "text"
      },
      "source": [
        "From **[ii]**:\n",
        "\n",
        "| Type / Stride       \t \t| Filter Shape         \t\t \t\t\t|\n",
        "|--------------:       \t\t| :-----------:       \t\t \t\t \t|\n",
        "| Conv dw/s2       \t    \t| 3x3x256 dw       \t   \t        |\n",
        "| Conv/s1       \t       \t| 1x1x256x512       \t        \t|\n",
        "| 5x Conv dw/s1<br>Conv/s1| 3x3x512 dw<br>1x1x512x512   \t|"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "W-5KNEqFQORZ",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "x = mobilenet_block(x, filters=512, strides=2)\n",
        "for _ in range(5):\n",
        "    x = mobilenet_block(x, filters=512, strides=1)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "fRp2aUzOQM1m",
        "colab_type": "text"
      },
      "source": [
        "From **[ii]**:\n",
        "\n",
        "| Type / Stride \t| Filter Shape  \t|\n",
        "|---------------\t|--------------:\t|\n",
        "| Conv dw/s2    \t| 3x3x512 dw    \t|\n",
        "| Conv/s1       \t| 1x1x512x1024  \t|\n",
        "| Conv dw/s1    \t| 3x3x1024 dw   \t|\n",
        "| Conv/s1       \t| 1x1x1024x1024 \t|\n",
        "| Avg Pool/s1   \t| Pool 7x7      \t|\n",
        "| FC/s1         \t| 1024x1000     \t|\n",
        "| Softmax/s1    \t| Classifier    \t|"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "hcIUtjFMP5rK",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "x = mobilenet_block(x, filters=1024, strides=2)\n",
        "x = mobilenet_block(x, filters=1024, strides=1)\n",
        "\n",
        "x = AvgPool2D(pool_size=7, strides=1)(x)\n",
        "output = Dense(units=1000, activation='softmax')(x)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "7jwlpbOPPXxh",
        "colab_type": "text"
      },
      "source": [
        "## Final code"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "ClgtoO-zPVfM",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "from tensorflow.keras.layers import Input, DepthwiseConv2D, \\\n",
        "     Conv2D, BatchNormalization, ReLU, AvgPool2D, Flatten, Dense\n",
        "\n",
        "def mobilenet_block(x, filters, strides):\n",
        "    x = DepthwiseConv2D(kernel_size=3, strides=strides, padding='same')(x)\n",
        "    x = BatchNormalization()(x)\n",
        "    x = ReLU()(x)\n",
        "\n",
        "    x = Conv2D(filters=filters, kernel_size=1, strides=1, padding='same')(x)\n",
        "    x = BatchNormalization()(x)\n",
        "    x = ReLU()(x)\n",
        "    return x\n",
        "\n",
        "INPUT_SHAPE = 224, 224, 3\n",
        "\n",
        "input = Input(INPUT_SHAPE)\n",
        "x = Conv2D(filters=32, kernel_size=3, strides=2, padding='same')(input)\n",
        "x = BatchNormalization()(x)\n",
        "x = ReLU()(x)\n",
        "\n",
        "x = mobilenet_block(x, filters=64, strides=1)\n",
        "\n",
        "x = mobilenet_block(x, filters=128, strides=2)\n",
        "x = mobilenet_block(x, filters=128, strides=1)\n",
        "\n",
        "x = mobilenet_block(x, filters=256, strides=2)\n",
        "x = mobilenet_block(x, filters=256, strides=1)\n",
        "\n",
        "x = mobilenet_block(x, filters=512, strides=2)\n",
        "for _ in range(5):\n",
        "    x = mobilenet_block(x, filters=512, strides=1)\n",
        "  \n",
        "x = mobilenet_block(x, filters=1024, strides=2)\n",
        "x = mobilenet_block(x, filters=1024, strides=1)\n",
        "\n",
        "x = AvgPool2D(pool_size=7, strides=1)(x)\n",
        "output = Dense(units=1000, activation='softmax')(x)\n",
        "\n",
        "from tensorflow.keras import Model\n",
        "\n",
        "model = Model(inputs=input, outputs=output)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "m_Cz54_cPIvz",
        "colab_type": "text"
      },
      "source": [
        "## Model diagram\n",
        "\n",
        "<img src=\"https://raw.githubusercontent.com/Machine-Learning-Tokyo/CNN-Architectures/master/Implementations/MobileNet/MobileNet_diagram.svg?sanitize=true\">"
      ]
    }
  ]
}